different sample
- Health & Medicine (0.46)
- Automobiles & Trucks (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.
- Health & Medicine (0.46)
- Automobiles & Trucks (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints
Pena, Sebastian, Lin, Lin, Li, Jia
The rapid emergence of single-cell data has facilitated the study of many different biological conditions at the cellular level. Cluster analysis has been widely applied to identify cell types, capturing the essential patterns of the original data in a much more concise form. One challenge in the cluster analysis of cells is matching clusters extracted from datasets of different origins or conditions. Many existing algorithms cannot recognize new cell types present in only one of the two samples when establishing a correspondence between clusters obtained from two samples. Additionally, when there are more than two samples, it is advantageous to align clusters across all samples simultaneously rather than performing pairwise alignment. Our approach aims to construct a taxonomy for cell clusters across all samples to better annotate these clusters and effectively extract features for downstream analysis. A new system for constructing cell-type taxonomy has been developed by combining the technique of Optimal Transport with Relaxed Marginal Constraints (OT-RMC) and the simultaneous alignment of clusters across multiple samples. OT-RMC allows us to address challenges that arise when the proportions of clusters vary substantially between samples or when some clusters do not appear in all the samples. Experiments on more than twenty datasets demonstrate that the taxonomy constructed by this new system can yield highly accurate annotation of cell types. Additionally, sample-level features extracted based on the taxonomy result in accurate classification of samples.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm.
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Ni, Yunsheng, Liu, Chuanjian, Tang, Yehui, Han, Kai, Wang, Yunhe
Speculative decoding emerges as a pivotal technique for enhancing the inference speed of Large Language Models (LLMs). Despite recent research aiming to improve prediction efficiency, multi-sample speculative decoding has been overlooked due to varying numbers of accepted tokens within a batch in the verification phase. Vanilla method adds padding tokens in order to ensure that the number of new tokens remains consistent across samples. However, this increases the computational and memory access overhead, thereby reducing the speedup ratio. We propose a novel method that can resolve the issue of inconsistent tokens accepted by different samples without necessitating an increase in memory or computing overhead. Furthermore, our proposed method can handle the situation where the prediction tokens of different samples are inconsistent without the need to add padding tokens. Sufficient experiments demonstrate the efficacy of our method. Our code is available at https://github.com/niyunsheng/EMS-SD.
- North America > United States (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
GVP: Generative Volumetric Primitives
R, Mallikarjun B, Pan, Xingang, Elgharib, Mohamed, Theobalt, Christian
Advances in 3D-aware generative models have pushed the boundary of image synthesis with explicit camera control. To achieve high-resolution image synthesis, several attempts have been made to design efficient generators, such as hybrid architectures with both 3D and 2D components. However, such a design compromises multiview consistency, and the design of a pure 3D generator with high resolution is still an open problem. In this work, we present Generative Volumetric Primitives (GVP), the first pure 3D generative model that can sample and render 512-resolution images in real-time. GVP jointly models a number of volumetric primitives and their spatial information, both of which can be efficiently generated via a 2D convolutional network. The mixture of these primitives naturally captures the sparsity and correspondence in the 3D volume. The training of such a generator with a high degree of freedom is made possible through a knowledge distillation technique. Experiments on several datasets demonstrate superior efficiency and 3D consistency of GVP over the state-of-the-art.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Germany > Saarland (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
The Role of Resampling Techniques in Data Science - KDnuggets
When working with models, you need to remember that different algorithms have different learning patterns when taking in data. It is a form of intuitive learning, to help the model learn the patterns in the given dataset, known as training the model. The model will then be tested on the testing dataset, a dataset the model has not seen before. You want to achieve an optimum performance level where the model can produce accurate outputs on both the training and testing dataset. You may have also heard of the validation set.